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Abstract 

We consider the problem of revenue-optimal dynamic mechanism design in settings where 
agents' types evolve over time as a function of their (both public and private) experience with 
items that are auctioned repeatedly over an infinite horizon. A central question here is under- 
standing what natural restrictions on the environment permit the design of optimal mechanisms 
(note that even in the simpler static setting, optimal mechanisms are characterized only under 
certain restrictions). We provide a structural characterization of a natural "separable" multi- 
armed bandit environment (where the evolution and incentive structure of the a-priori type is 
decoupled from the subsequent experience in a precise sense) where dynamic optimal mechanism 
design is possible. Here, we present the Virtual Index Mechanism, an optimal dynamic mech- 
anism, which maximizes the (long term) virtual surplus using the classical Gittins algorithm. 
The mechanism optimally balances exploration and exploitation, taking incentives into account. 

We pay close attention to the applicability of our results to the (repeated) ad auctions used in 
sponsored search, where a given ad space is repeatedly allocated to advertisers. The value of an 
ad allocation to a given advertiser depends on multiple factors such as the probability that a user 
clicks on the ad, the likelihood that the user performs a valuable transaction (such as a purchase) 
on the advertiser's website and, ultimately, the value of that transaction. Furthermore, some 
of the private information is learned over time, for example, as the advertiser obtains better 
estimates of the likelihood of a transaction occurring. We provide a dynamic mechanism that 
extracts the maximum feasible revenue given the constraints imposed by the need to repeatedly 
elicit information. 

One interesting implication of our results is a certain revenue equivalence between public 
and private experience, in these separable environments. The optimal revenue is no less than 
if agents' private experience (which they are free to misreport, if they are not incentivized 
appropriately) were instead publicly observed by the mechanism. 
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1 Introduction 



Designing mechanisms in dynamic environments — in which agents valuations evolve as a function 
of their "experience" with the allocated items — is a problem which has received much recent 
interest. One of the most compelling applications here is that of ad auction sponsored search, in 
which search engines sell the advertisement spaces that appear alongside the search results. 

Let us discuss the sponsored search example in more detail: typically, an advertiser places an 
ad in order to: first, draw a client to visit the advertiser's website (via a click on the displayed 
ad), and then, subsequently, have the client purchase some product. The expected value that an 
advertiser obtains from a displayed ad depends on both the "click-through rate" (the probability 
that a user clicks on the ad, sending the user to the advertiser's website) and the "conversion 
rate" (the probability that the user who visits the website performs a desired transaction, e.g. a 
purchase). This is a dynamic environment in which both advertisers and the search engine (the 
mechanism) learn and update their estimates of these rates over time. Observe that a click is a public 
experience, i.e., observed by both the advertiser and the search engine. In contrast, a transaction 
is only observed by the advertiser — it's private experience of the advertiser with the displayed ad. 
The dynamic challenge here is to design appropriate mechanisms which align incentives such that 
the search engine and the advertiser share this information for some desired outcome. 

In the oft used practical mechanisms, the learning of click-throug h-rates and conversions-rate s 
have been separated due to this asymmetry of information — (see iMahdian and TomakI 20071 ] . 
Aearwal et all |2009l | for further discussions on "pay-per-action" pricing schemes). Two fundamen- 



tal question that arises in this setting is: how much revenue does this asymmetry of information 
cost the mechanism? How much more revenue would the mechanism be able to obtain, if it were 
able to monitor the transactions on the advertisers' websites? 

In the static setting, the two foremost objectives for a mechanism are either maximizing the 
social welfare of the buyers (efficiency) or the maximizing the revenue of the seller (optimality) — 
though the spectrum of other objectives is large and notable. By extension, these are the natural two 
objectives to consider for the dynamic setting. With regards to maximizing the future social welfare 
in a dynamic setting, there is an e legant extension o f the e f ficient fVCG) mechanism applica ble to 
quite general dynamic settings bv lParkes and Sing 3 |2003l |. iBergemann and Valimakil |2007l |. This 
dynamic mechanism seamlessly inherits the core concepts of the static VCG mechanism — namely, 
charging an agent the externality they impose, which is implemented via dynamic programming 
ide as. Related dyiiamic mechanisms include the dynamic budget-balanced efficient mechanism 
by Athey and SegaJ 2007], efficient mechanisms for dynamic population s by Cavallo et al 2007 1. 



and non-Bayesian ( asymptotically) efficient dynamic mechanisms (see iNazerzadeh et al 



Babaioff et all |2009l ]l 



20081 ] ■ 



With regards to optimal dynamic mechanisms in a dynamic setting, the state of affairs is more 
murky. As we discuss in the next section, while there are detailed characterizations of necessary 
conditions for which (incentive compatible) dynamic mechanisms must satisfy, there a re only a few 



rather restricted special cases for which optimal mechanisms are characterized (e.g. see lPavan et al 
20081 ]). To some extent, results for special cases are to be expected, as even in the simpler static 



mechanism design problem, the efficient (VCG) mechanism is applicable to general settings (e.g. 
combinatorial auctions with no distributional assumpti ons) while even the optimal mechanism for 
selling a single item (provided in the seminal work of iMyersonI 198lf ] ) is only applicable under 
certain distributional restrictions 

In the more challenging dynamic setting, perhaps the most central question is understanding 
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what natural restrictions permit the design of optimal mechanisms. This is the focus of this paper, 
and we provide a certain structural characterization of a "separable" environment (allowing for 
public and private experience), in which optimal dynamic mechanism design is possible. Our 
characterization is rather rich in that it permits both a natural stochastic processes (where private 
and public signals can be discrete or from abstract spaces) and is applicable to certain natural 
formalizations of the aforementioned sponsored search setting (where both public click-through- 
rates and private purchase-rates evolve over time) . Our construction draws a rather close connection 
to efficient mechanism design (where our optimal mechanism utilizes the efficient mechanism for a 
certain affinely transformed social welfare function). Furthermore, we also address the issue of how 
much revenue is lost due to private signals (rather than public signals, observed by the mechanism) 
in these separable environments - somewhat surprisingly, there is no loss. 



1.1 Contributions 



Our main contribution is designing an individually rational and incentive compatible, revenue- 
optimal mechanism, called the Virtual Index Mechanism, for settings with 1 seller and k (agents) 
buyers where the environment satisfies certain separable properties and evolves according to a 
multi- armed bandit process. 

The Virtual Index Mechanism is quite simple. I n sho r t, the allocation rule of the mechanism is 
based on the notion of Gittins indices (see GittinsI |l989l |. Whittle 1982 ]) and the payment rule is 
derived by considering a dynamic VCG mechanism (where the social welfare function is transformed 
under a particular, time- varying affine function). 

The allocation assigns to each agent an "index" which is computed based on solely the agent's 
current state ; and a t each step the mech anism allocates the item to an agent with the highest 
index. As in iGittind |l989l |. Iwhittl^ |l982l |. the key observation is that this computation does not 
require specifying a policy in terms of the (potentially exponentially many) histories. If all the 
agents are trut hful, then the mechanism maximizes the "virtual surplus"; the idea pioneered in 



MversonI [198l|. 



It turns out that the allocation we use also coincides with the efficient dynamic (VCG) mech- 
anism (with respect to a transformed social welfare function). Due to this, the payment rule of 
our mechanism is rather simple to specify. In fact, one of our technical contributions is using 
this reduction to dynamic (affine) VCG in the construction of a revenue-optimal mechanism. This 
connection is useful for two reasons: first, it allows us to reduce the problem of checking incentive 
compatibility to essentially a one period problem. Second, this VCG pricing allows us to utilize 
rather general multi-armed bandit processes. 

A surprising implication of our result is that the seller does not lose any revenue (under the 
optimal mechanism) if the experience were private rather than publicly observed. In the context 
of sponsored search, this implies that (in environments which satisfy our "separability" assump- 
tion) the ability to monitor the transactions that occur in the advertisers' own websites does not 
increase the revenue. An important business insight provided by this result is that pay-per-action 
mechanisms can be implemented without a loss of revenue if the search engine is able to commit 
to a long-term contract. 
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1.2 Related Work 



The most closely related work to ours is that in IPavan et al.1 [20081 . l2009l | . The primary contribu- 
tions of this work is that it establis hes rather detailed necessary conditions for dynamic incentiv e 



compatibilit y in both finite hor izons ( Pavan et al. 2008l |) and infinite horizons ( Pavan et al. 2009( |) 



In addition, iPavan et al.l 2008l | also establish a dynamic version of the Revenue Equivalence Theo- 
rem in a particular finite horizon setting. With regards to providing dynamic optimal mechanisms 
their work only provides mechanisms in somewhat limited special cases — such as when valua- 



tions evolve according to a certain auto-regressive AR{k) stochastic processes, and, in lPavan et al 



2009l |. the value evolution evolves according to a particular additive manner, where each private 



experience of the agents is assumed to be independent of all previous private experiences (it is 
allowed to depend only on the number of times the item was previously allocated, a much more 



restrictive assumption than those provided by our results) . We should also note the work in iDet 
I2OO8,] . which provides an optimal mechanism in a restricted setting where the value is Markov in 
the previous value, among other technical conditions — again, their model does not permit rich 
dependencies on historical signals. In our sponsored search example, these prior results are not 
applicable, due to both the multiplicative nature of the value function (as discussed later), and 
due to that the sequence of experiences are not independent (e.g. with a Bayesian, "Bernoulli" 
prior on the probabilities of binary "click" or "purchase" events, the experiences are not necessarily 
independent). 



Also, in contrast to IPavan et all {20081 . 12OO9I ]. we should emphasize that the aim of our work 



is not to characterize necessary conditions which any (incentive compatible) dynamic mechanism 
must satisfy - our focus is on the optimal i nechanism itself. In fact , we do not even utilize the 
dynamic "envelope" conditions provided by Pavan et al.l |2008l . 2009l |. as they require: many de- 
tailed technical assumptions; the signals (the experiences) to be real valued; and, often, certain 
probability kernels to have densities and be differentiable — as such, these conditions either do 
not hold or are difficult to verify in our setting. Instead, our derivation proceeds from merely 
static consideratio ns, where we use only ince ntive compatibility constraints from static mechanism 
design theory (see iMilgrom and Segall 2002l | ) to establish the expected revenue of any (incentive 
compatible) mechanism (e.g. the so c alled "envelope theorem"). Certainly, the sufficient conditions 
provided by Pavan et al. 2008, 2009l | (derived with rather sophisticated proofs) are stronger than 
than those conditions used here, as they explicitly account for dynamic considerations and are 
interesting in their own right — one further direction is if these conditions can be used to derive 
optimal dynamic mechanisms in settings more general than those provided here. 

Conceptually, our proof is rather simple: both the connection to dynamic (affine) VCG and 
our use of only a static "envelope" condition allow us to reduce the proof of dynamic incentive 
compatibility to essentially a one-period, static problem. However, this one-period verification 
requires a delicate stochastic coupling argument, where we utilize both the bandit nature and the 
separability of our stochastic process. 



We also briefly mention other notable work here. IVulcano et al.l j2002l | analyze the problem 



of opt imal dynamic rnecha nism design in the context of perishable goods (work later expanded 



on bv lPai and Vohral I2OO8II). These works built upon dynamic programming ideas to extend the 



classical result of iMversonI jl98l[ | to a dynamic se tting. A key assu mption in both of these papers 
is that agents' valuations do not evolve over time. Battaglini 2005l | studies the question of optimal 



We note that the work in iPavan et al.] [20091 ] is a recent preliminary draft, and is concurrent to this work. 
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mechanism design in a setting with a single consumer whose private information is given by a 
2-state Markov Chain. Eso and SzentesI 2007 ] obtain a result in a two-period model that is similar 
in flavor to ours: the buyers do not benefit from obtaining new private information at period 2 if 
the seller uses a 'handicap' auction, where the handicap is given by the buyer's virtual value at 
period 1. 



1.3 Organization 

We organize our paper as follows. In Section [21 we formalize our model, define separable environ- 
ments, show examples and define the basic notions we use throughout the paper, such as incentive 
compatibility and optimality of mechanisms. In Section [3l we consider a variant of our model where 
the mechanism can monitor the private experiences of the agents and describe how to it establishes 
a bound for the revenue of a mechanism in our setting. In Section IH we state and explain our main 
theorem — the optimality of the Virtual Index Mechanism. All proofs are in the Appendix. 



2 Preliminaries 
2.1 Environment 

We consider a setting with 1 seller and k agents (buyers) who are competing for items that are being 
allocated at every timestep (starting at t = 1) over a discrete time infinite horizon. At the start of 
t = 1, agents (privately) learn their initial types. The initial type of agent i is a (non-negative) real 
number 9i G [0, Qi], independently distributed according to some given distribution Fi{-). At every 
subsequent timestep, the state of each agent i is summarized by the tuple of their initial type 9i 
and their (subsequent) "experience" with the item — this experience summarizes the type of the 
agent due to interactions with the item and the experience need not be real valued. More precisely, 
agent z's state at time t is of the form {6i, ei^t, Pi,t), where the current state of the private experience 
is denoted by e^^t ^ £i and the public experience is denoted by pi^t £ 'Pi, where fj and Vi are some 
(potentially arbitrary) set. Here, only the agent observes their private experience, while the public 
experience is also observed by the mechanism. 

We should emphasize that the first type 9 is real for reasons similar to that in the static setting 
— derivations of optimal mechanisms typically involve calculus on real valued types. However, it 
is only this first type that we assume to be real valued (subsequent experience is allowed to live in 
arbitrary signal spaces). As we specify later, this first type 9 has a persistent effect on the incentives 
(e.g., the values after t = 1 could also depend on the initial type). 

If agent i is allocated, the state of agent i's public and private experience with the item evolve 
in a Markovian manner. If i is not allocated, then the experience does not change — in this sense, 
we are dealing with a Markovian "bandit" process. By nature of public information, we assume 
the public process is completely decoupled from private information, e.g., the probability that the 
next public experience is p[ conditioned on the current experience being pi is G{p^\pi). In the most 
general sense, the evolution of the private experience could depend on the entire current state, e.g., 
the probability that the next private experience is conditioned on the current state {9i,ei,pi) is 
H{e^\9i,ei, Pi). However, it turns out that are not able to handle this level of generality, and our 
structural characterization specifies certain natural restrictions (in the next subsection). 

Note that the public experience evolution process only depends on the public times series, while 
the private experience process is allowed to depend on both private and public experience. We 
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assume that private experience is only observed by the agent, but the pubhc information is also 
observed by the mechanism. At time t = 1 (prior to the allocation at t = 1), for every agent i, 
en = is the empty experience, known by both the agent and the mechanism. 

The (instantaneous) value of each agent i at time t is a (stationary) function of their current 
state. In particular, agent i's value is Vi{9i,ei^t, Pi,t) when in state {6i, ei^t, Pi,t) at time t. The 
expected (future) value of agent i for the item at time t is equal to 6^~^Vi{6i,ei^t, Pi,t) where 6, 
< 5 < 1, is the common discount factor. 

2.2 Separable Environments 

Characterizing the assumptions which permit optimal mechanisms design is perhaps the most cen- 
tral question — even in the static setting, only in (natural) special cases optimal mechanisms are 
known. In our dynamic setting, it turns that we are not able to characterize an optimal mechanism 
in the full generality of the above "bandit" environment. However, our main contribution is spec- 
ifying a natural "separable" environment, under which we can derive an optimal mechanism. We 
say that the environment is separable if both the stochastic process over the types and the value 
functions themselves are separable, in a precise sense which we now define. Intuitively, the notion 
of separability decouples the initial (real valued) type 9 from the experience, both in terms of the 
stochastic evolution and the incentive structure. 

Definition 2.1. The stochastic process is said to be separable if the evolution of the private expe- 
rience is Markovian in the current experience, e.g., the probability that the next private experiences 
is e- conditioned on the current state {6i,ei,pi) is H{e[\ei, pi) (in particular, H does not depend on 

e^). 

We consider two natural classes of separable value functions. 
Definition 2.2. Additively or multiplicatively separable value functions are defined as follows: 

• An additively separable value function has the following functional form, for all i, Oi, Cj, and 
Pi: 

Vi{ei,ei,pi) = Ai{9i,pi) + Bi{ei,pi) 

• A multiplicatively separable value is of the form: 

Vi{6i,ei,pi) = Ai{ei)Bi{ei,pi) - Ci{pi) 
Taken together, we say that the environment is (additively or multiplicatively) separable. 

2.3 Examples of Separable Environments 

We now provide two examples of settings for separable value functions, which fall within our 
framework (and satisfy our assumptions). 

Sponsored Search: Consider an auction for a keyword that corresponds to a certain product. 
Suppose i is an online retailer of such a product who participates in the corresponding sponsored 
search auction. Every time a user types in the keyword, the ad space is allocated to (at most) 
one retailer. Every time a user purchases the product from them, the retailer i obtains a value of 
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6i (and otherwise). The private experience Cj^t describes the retailer's Bayesian behef about the 
probabihty of a purchase given a chck has occurred. Similarly, the public experience pi^t represents 
the Bayesian belief about the probability of a click occurring given the retailer's ad is shown. 
Therefore, Vi{6i,ei^t, Pi,t) = Pr [purchase |ej^f, c/icA;] Pr [click After each time the ad of retailer 
i is shown to a user, both the retailer and the search engine update the belief pi^t about probability 
of a click. After each click, the retailer updates their belief e^^t about the probability of a purchase. 

Auto-Regressive (AR): The evolution of the valuation of each agent i in an AR{1) model is 
as follows. The initial value of agent i is given by Vi^Q = 9i, and every time the item is allocated 
to agent i his valuation is updated acc ording to Vi^t+i = AiVi^t + Bi{ei^f, Pi,t)- In the AR model 



considered in iPavan et al.l |2008l . l2009l |. the agent's value is updated by adding an independent 
shock, or a shock that depends only on the previous allocations (e.g., number of times the item 
was allocated to the agent), but is independent of all private information. Our model allows for the 
update of the value to depend both on the private and the public experiences. They also consider 
AR{k) processes, where the valuation is updated according to an affine function of the values at the 
k previous times the item was allocated to the agent. Our restriction to AR{1) models is without 
loss of generality since, by augmenting the state space, an AR{k) process can be represented as an 
AR{1) process (and through an appropriate choice of the function Ai{-)). 

2.4 Mechanisms, Incentive Constraints, and Optimality 

By the Revelation Principle (cf . Mverson 1986l | ) , without loss of generality we can focus on direct 



mechanismsH A direct mechanism A4{Q,'P) is defined by a pair of an allocation rule Q and a 
payment rule V. In a dynamic direct mechanism, at each timestep t, an agent is asked to report 
their current private state pair {9i,ei^t) — we denote this report by {Oi^t,^i,t)- Note that a direct 
mechanism elicits redundant information as the initial type 9i of an agent remains constant over 
time, while the mechanism asks the agent to re-report this type every round (similarly, the private 
experience of an agent does not evolve in a period in which it did not receive an allocation, yet the 
mechanism asks for re-reports). 

We denote the (joint) vector of reports, public state, allocations, and payments at time t by 
{6t,et, pt,qt,Pt), where qi^t and pi^t correspond to the allocation and payment of agent i at time t 
iQi,t = 1 if « received the item at time t and otherwise). The history ht observed by the seller 
at any given time t includes the past reports, the past public experience, the past allocations, and 
the past payments (and does not include either the past private experiences or the initial types). 
The history hi^t observed by an agent i at time time t includes her past initial type, her private 
experiences, her prior reports, the prior payments, and the prior allocations (and does not include 
other agents true or reported private experiences or initial types). 

In short, at each timestep t > 1 the following sequence of events occur: 

1. Each agent i reports {Oi^t,(^i,t) only to the mechanism. 

2. The mechanism allocates the item to an agent , if qi*^t = 1 (or potentially to no one). 

3. Each agent i is charged pi^f 

4. Agent i'^'s (private and public) experience evolve. 



^ The Revelation Principle implies that an equilibrium outcome in any indirect mechanism can also be induced as 
an equilibrium outcome of an (incentive compatible) direct mechanism. 
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We now define the incentive constraints of the mechanism. First, some definitions are in order. 
A reporting strategy for agent i is a mapping from her type, her private experience state, and 
the history to a report (of her initial type and the current state of the private experience). Let 
R denote a joint reporting strategy and Ri denote this strategy for i. For mechanism A4, define 
the (discounted) expected future value and payment of agent i at time t, under (joint) reporting 
strategy R, conditioned on some event h, as follows: 



M,R 



i,t 



(h) = E 



^5*' ^qi,t'Vi{9i,ei^t',Pi,t') 



h 



where the evolution of the process is under A4 under reporting strategy TZ — conditioned on the 
event ht (the expectation is with respect to all variables not conditioned on). For example, for some 
current state 6i, Ci^t, Pi,t for i and history hi^t for agent i, v/f'^{Oi, Ci^t, Pi,t, hi^t) is the expected future 
value of i conditioned on her knowledge at time t. Similarly, define the (discounted) expected future 
utility for agent i as: 

ut^'^'ih) = y^^'^h) - p^'^^'^h) (1) 

We say that Ri is a best response to R-i conditioned on event h (for agent i) if the Ri maximizes 
her utility, e.g., Uf^'^'^h) is greater than U^'^'' {h) (for all other i?/, where R-i is held fixed). We 
say the truthtelling strategy T is the reporting strategy under which all agents always reports their 
initial types and their private experiences truthfully. 

We now define incentive compatibility. Roughly speaking, this concept says that as long as all 
agents are truthful, then no agent ever wants to deviate. 

Definition 2.3. (Incentive Compatibility) A dynamic direct mechanism is incentive compatible if, 
for each agent i, with probability one, truthtelling is a best response (assuming the other agents to 
be truthful) at each time t with respect to the history of i at time t. Precisely, with probability 1, 
for all times t and all Ri, 

U(^''^{e,,e,^uPi,t.K,t) > C/^^'^'^"^-'^(^^,e.,^,pi,^,/^i,t) 
where the probability is with respect to 0i,ei^t, Pi,t,hi^t sampled under the truthful reporting strategy. 

We consider a stronger notion, namely, periodic ex-post incentive compatibility, where best 
responses hold even on histories where misreports occur (see Definition 14. ip . 

We also allow the following participation constraint, in which agents may opt out at any time 
for future utility. 

Definition 2.4. (Individual Rationality) Under an individually rational mechanism, for each agent 
i, with probability 1, truthful agents obtain a non-negative expected future utility assuming the other 
agents are truthful. Precisely, with probability 1, for all times t and all Ri, 

U^'^i0i^'^i,t,Pi,t,hi^t) > 

where the probability is with respect to Oi,ei^t^ Pi,t,hi^t sampled under the truthful reporting strategy. 
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The expected revenue of an incentive compatible mechanism under the truthful strategy is the 
discounted sum of all payments of the agents, i.e., 

(2) 

The objective of the seller is to maximize this expected revenue, subject to both the incentive 
compatibility constraint and rationality constraint. Precisely, 

Definition 2.5. (Optimality) An individually rational and incentive compatible mechanism is op- 
timal if it maximizes the expected revenue among all individually rational and incentive compatible 
mechanisms. 



Rev 



M 



E 



.4 = 1 



3 A Dynamic Constraint From Static Considerations 



In this section, we establish a bound on the maximum revenue a seller can obtain given the incentive 
constraints of the agents. We consider a modified setting that we call the complete dynamic 
monitoring problem and show that it determines a bound for the seller's revenue in the model 
defined in Section [2l 

Consider the modified setting where the seller can fully monitor the agents' private experiences 
{ci^t}-, for each agent i and all t > 0. The seller still cannot observe the initial types OiS, and, 
as before, agents report 6i to the mechanism in the initial period. We denote this setting the 
complete dynamic monitoring problem. In this new setting, the me chanism design problem is 
a completely static one. Therefore, the incentive compatibility results of Milgrom and Segal 20021 ] 
for static mechanism design apply here. The following "revenue equivalency" theorem establishes 
the revenue obtained by an incentive compatible mechanism in the complete dynamic monitoring 
setting; the proof is in Appendix lA.ll 

Theorem 3.1. Assume complete dynamic monitoring. Assume as well that the the partial deriva- 
tive ^^'^gg'^"^'-* exists for all Oi, Ci and pi and there exists some B < oo such that | ^liii^i^ii^ii | < ^ 

for all 9i, Cj and pi. Then, the revenue Rev^ of any such incentive compatible mechanism Jv[ 
satisfies 



1=1 



.t=l 



where ipi is defined as 



i>i{Oi,ei^t,Pi,t) = Vi{9i,ei^t,Pi,t) 



1 - F{ei) dvi{9i,ei^t,Pi,t) 



(3) 



(4) 



f{Oi) dOi 

and U^{0,9-i) is the utility agent i obtains if his type is equal to and the other agents' types are 



Similarly to lMyersonI 198ll |. we refer to ipi as the virtual value. The right-hand side of Eq. ^ is 
the virtual surplus. Individual rationality is equivalent to the requirement that C//^(0, > for 
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all i and Therefore, Eq. ^ implies that for any incentive compatible and individually rational 
mechanism Ai in the complete dynamic monitoring setting, 



Rev 



M 



< 



i=l 



oo 

.t=i 



(5) 



if the assumptions of Theorem 13.11 hold. 

Now consider a direct mechanism Ai for the original setting without monitoring which is both 
incentive compatible and individually rational. The exact same mechanism, including the allocation 
and payment rules, can be applied in the setting with complete dynamic monitoring. To do so, we 
simply replace the agent's reported private experiences {cj^f} by the agent's (monitored) private 
experiences {ei^t} in the input of the allocation and payment rules. Any strategy available to the 
agents in the setting with complete dynamic monitoring is a feasible strategy in the setting without 
monitoring (where the agent reports truthfully after the initial report). Therefore, if all other 
agents are truthful, any profitable deviation from the truthful strategy in the setting with complete 
dynamic monitoring implies a profitable deviation in the setting without monitoring. Since no 
such profitable deviations exist in the setting without monitoring, we obtain that the mechanism 
A4 is both incentive compatible and individually rational in the setting with complete dynamic 
monitoring. Therefore, Eq. ([5]) establishes an upper bound on the revenue of mechanisms for the 
setting without monitoring as well. 

Corollary 3.1. Assume the the partial derivative '^"'^gg^"^'^ exists for all 6i, Cj and pi and there 
exists some B < oo such that | ^iHiI^i^ii^ii | < fgj- qH g.^ Qi^g p.^ Then, the revenue Rev^ of 
any incentive compatible, individually rational mechanism A4 satisfies 



Rev^ < maxE 



t=l i=l 



(6) 



where Q represents the set of all allocation rules. 



The corollary suggests a candidate allocation rule for an optimal mechanism in the setting 
without monitoring. The maximization problem on the right-hand side of Eq. ([6]) is a multi-armed 
bandit problem, where the payoff of ar ms given by t h e virtual value s. This optimization problem 
can be solved using Gittins indices (see lOittineJ |l989l |. Iwhittli |l982l |V We use this allocation rule 
in the mechanism we design in the next session. 



4 The Virtual Index Mechanism 

In this section, we present our main result, an optimal dynamic mechanism, called the Virtual 
Index Mechanism. In short, the allocation rule is as follows: the mechanism assigns to each agent 
an "index" (computed based on virtual values) and at each step allocates the item to an agent with 
the highest index. If all the agents are truthful then the mechanism maximizes the revenue as well 
as the virtual surplus. Furthermore, the mechanism enjoys more desirable incentive constraints — 
it satisfies stronger notions of incentive compatibility and individual rationality. 

Definition 4.1. (Periodic Ex-post Incentive Compatibility) A dynamic direct mechanism is periodic 
ex-post incentive compatible if for all agents, truth-telling is a best response conditioned on any 
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historical event and conditioned on the current state of the other agents (assuming other agents 
to be truthful in the future). Note here the historical event need not be a truthful history, but is 
arbitrary. 

Definition 4.2. (Periodic Ex-post Individually Rationality) A dynamic direct mechanism is peri- 
odic ex-post individually rational if for all agents and conditioned on any historical event and con- 
ditioned on the current state of the other agents, the agent's expected future utility is non-negative 
under the truthful strategy (assuming other agents to be truthful in the future). 

Thes e two stronger incentive constra ints are ensured by the dynamic VCG mechanism pro- 



vided bv lBergemann and Valimakil [20071]. In our setting, our optimal mechanism also enjoys these 



properties. 

Our main theorem relies on the following: 

Assumption 4.1. (Separable Environment) Assume the environment is (additively or multiplica- 
tively) separable. 

Assumption 4.2. (Concave Values) Let Ai, Bi and Ci be as defined in Definition \2.^ Assume 
Ai is difjerentiable and non- decreasing with respect to 6 in both cases (additive and multiplicative); 
in the additive case, Ai is concave; in the multiplicative case, Ai is log-concave; Bi and Ci are both 
bounded and non-negative. 

Assumption 4.3. (Monotone Hazard Rate) The density of Fi(-) exists for every agent i and is 
denoted by fi. Also, the inverse hazard rate ^ decreasing in 9. 



Theorem 4.1. (Optimality) Suppose Assumptions \4-. i\ \4-^ and \4.3\ hold. Then, the Virtual Index 
Mechanism (as defined in Figure 1) is optimal. Furthermore, the Virtual Index Mechanism is 
periodic ex-post incentive compatible and individually rational. 

This theorem has a very surprising implication. The Virtual Index Mechanism is a mechanism 
designed to maximize revenue in a context where the mechanism has to create appropriate incentives 
for the agents to reveal private information over time. However, the revenue it produces is identical 
to the mechanism that can (publicly) observe the dynamically evolving private experiences of the 
agents (since it is also a feasible mechanism and optimal for the problem with complete dynamic 
monitoring, and by construction, it has the same revenue). Hence, the mechanism's capability to 
monitor the agents' dynamically evolving private experiences does not yield any revenue for the 
mechanism. We now formalize this claim. 

Corollary 4.1. Suppose Assumptions \4-^ anrf \4-3\ hold. The Virtual Index Mechanism is 
optimal for the setting with complete dynamic monitoring. Furthermore, the seller obtains the same 
expected revenue in both the setting with complete dynamic monitoring and without monitoring. 

We now proceed to describe the Virtual Index Mechanism in detail. The analysis of this theorem 
is contained in Subsection I4.4[ 

4.1 Virtual Surplus, Social Welfare, and a Fictitious Phase 

If agents are truthful, we seek that the allocation rule maximizes the discounted sum of virtual 
values ipi{0i,ei^t, Pi,t), as defined in Eq. (jj]). However, in order to satisfy incentive constraints, we 
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The Virtual Index Mechanism: 

At (fictitious) time t = 0, 

Each agent i reports Oifi. 

Each agent i is charged pi^, see Eq. ([T3]). 
At each time t = 1, . . . 

Each agent i reports Oi^t and ii^t. 

Ahocate to i*, an agent with the maximum 5^''° {Oi^t,^i,t, Pi,t), see Eq. 
Charge i*, pi* t, see Eq. (fTT]) . 



Figure 1: Description of the Virtual Index Mechanism 

consider a broader mechanism, which allows the agents to bid at a fictitious t = phase — crucially, 
while we include this fictitious 0-th phase, we should point out that this phase actually occurs at 
t = 1. 

Let us first understand the nature of the virtual surplus, under separable values. A key ob- 
servation is that separability of the value functions implies that the virtual value ^|Ji is an affine 
function of the value, conditioned on 9i. Precisely, 

Lemma 4.1. (Affine Virtual Values) For a separable value function, the virtual value is given by: 

tpi{0i,ei,t,Pi,t) = ai{ei)vi{6i,ei^t, Pi,t) + f^iOi, Pi,t) (7) 
where functions ai{9i) and (3i are defined as follows: 
• (Additive) Vi{9i, ei,pi) = Ai{6i,pi) + Bi{ei,pi) 

a^{e^) = 1 

Pi[0i,pi) = 



f{Oi) 09. 

(Multiplicative) Vi{9i,ei,pi) = Ai{9i)Bi{ei, pi) - Ci{pi) 

^ l-F{9i)A',{9, 



oti{9i 



fm AiOi) 



The observation is that once 0i is fixed, the virtual values are an affine function of the values — 
though this affine function could vary with time (in the additive /?(•))• Recall, in a static setting, 
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affine transformations of the social welfare function can be implemented via an affinely transformed 
VCG mechanism. Here, we provide a (time var ying) transformation for th e dyna mic setting, using 
an affine transformation of the mechanism of iBergemann and Valimakil 2007l | . This reduction 
satisfies the dynamic incentive constraints and leaves us only with the static incentive constraint 
required for eliciting initial types 9 — this is the heart of our technical argument in Lemma 14. 4[ 

The above structure motivates the use of a "fictitious phase" , where we break the t = 1 phase 
into two parts. In the first part of this phase (which we label as t = 0), the agents make a report 
r of 9, which we use to specify the functions a and /?. From then on, we proceed as an (affinely) 
transformed VCG mechanism, where agents are allowed to re-report 9 (though a and (3 are pegged 
to the initial report r). We now specify this precisely. 



4.2 Allocation Rule 

To find the optimal allocation, we construct the following (k + l)-armed bandit process (where k 
is the number agents), augmented with a 0-th arm which corresponds to a non-transitioning arm 
that always pays 0. For a vector r € M?L, define the weighted social welfare as: 



W'^{9,e,p) =E 



■ OO K 

X^f^*"^ ^ qi,t {cti{ri)vi{9i, ei^t,Pi,t) + I3{ri,pi^t, 
■t=i 1=1 



y,ei = e, pi 



(8) 



where for i = 0, vo{-) = ao(-) = /3o('5 ■) = 0. Note that the vector r substitutes 9 in the a and /3 
components of the virtual value (cf. Eq. ([7])). With this structure, for r = 9, W^{9, e, p) represents 
the virtual surplus and the desired revenue of the mechanism (see Corollary 13. ip . 

We use the initial phase t = to allow the agents to set r. Subsequently, the allocation rule we 
use is the one that maximizes the weighted social welfare for this given r. In particular, for any 
r, w e can find the algorithni tha t maximizes the weighted social welfare using the Gittins index 
f see ICittinsI il989l |. Iwhittl^ |l982l |). 



Definition 4.3 (Virtual index). For each agent i, the virtual index is defined as: 

'E?=i^'~'C'iOueiuPrt) 



, Pi 



maxE 



I ^il 1 Pil 



where the maximum is taken over all stopping times ti and 

C"{Gi,eit,Pit) = ai{ri)vi{9i,eit,Pit) + (3{ri,pit). 



(9) 



(10) 



It is well-known that the Gittins index policy (that which chooses the arm with highest index) is 
the (Bayes) optimal algorithm for multi-armed bandit problems. Hence, the allocation rule which 
chooses the highest virtual index is the optimal algorithm for the {k + l)-armed bandit process 
where the goal is to maximize the weighted social welfare (where r is some fixed vector). 

We now describe the allocation (specified in Figure H]), including reports and allocations. At 
time t = 0, each agent reports 9ifi their initial type 9i. This initial report is used to set r above — 
in other words, their initial report determines the weights of the weighted social welfare function 
which the subsequent allocation tries to maximize. At ever subsequent time t > 1, they report 
9i^t and ii^t (the component pi^t is observed directly to the mechanism). This "freedom to correct" 
earlier misreports of 9i leads to the stronger (periodic ex-post) notion of incentive compatibility. 
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4.3 Payment Rule 

We now construct a payment scheme that makes the mechanism periodic ex-post incentive com- 
patible. It turns out that it is only the t = fictitious phase where all agents could potentially 
make a payment — thereafter, only the agent who is allocated pays. While, as specified in Figure 
1, these payments occur before the allocation, this is inconsequential (as they could occur after the 
first allocation with no change to any of our guarantees). 

As mentioned, the mechanism is the reminiscent of a sta tic (affine) VCG me c hanism, but with 

the ad ded dynamic twist of a time- varying, additive offset. In Cavallo et al. 2006l |. Bergemann and Valimaki 



20071], the payment of an agent after the allocation corresponds should be equal to the externality 



he imposes to other agents. In our context, the payment is an affine transformation of the exter- 
nality imposed. In particular, if at time t the item is allocated to agent i, i pays the following 
amount: 



Pi,t 



{l-s)w%{et,et,pt)-fi^{ei,o,pi,t) 

where WH^ ^ is the optimal virtual surplus of the other agents. Namely 



(11) 



WLit{0,e,p) = maxE 



oo 

t'=t j^i 



^j,t' ) Pj,t' 



<et = e,pt = p 



where Q is the set of allocations rules (and j is summing over the other k arms, including the 
non-paying arm of 0). Also ^ is defined in Eq. (jlOp . 

Finally, we specify the payment at time 0, pi^. First define Pi{9) as: 



p^ie) = v^{e) 



E 



^^t-i^z^dvi{z,ei^t,pi,t) 



.t=l 



dz 



dz 



(12) 



Note this is the desired payment of agent i conditioned on the vector 6q of reports of initial 
type at times — this revenue maximizes the upper bound in Corollary [3TTJ The price charged (to 
each agent i) is Pi^Oo) with a negative term to offset all expected future payments: 



Pi,o = Pi{0o)-^ 



t=l 



(13) 



This offset allows, in expectation, the revenue to be PiiOo) as desired. The heart of the proof is 
verifying incentive compatibility with respect to the initial report r. 



4.4 Analysis of Theorem 14.11 

We now give the outline of the proof of Theorem 14.11 using a series of lemmas. The proofs of these 
lemmas are given in the Appendix lA.2[ In the discussion of this subsection, we focus on the issue of 
incentive compatibility, and address the issue of individual rationality in the proof of the lemmas. 

The first step of the proof is to show that the mechanism is periodic ex-post incentive compatible 
for periods t > 1, irrespective of the reports of the initial types at the (fictitious) period 0. Recall 
that the mechanism implements an efficient allocation with respect to the "weights" that are as- 



signed as a function of the initial reports; the proof technique is similar to lBergemann and Valimaki 
20071 ]. 
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Lemma 4.2. Let Assumption \4-l\ hold. Then, for any initial report 9q, the Virtual Index Mecha- 
nism is periodic ex-post incentive compatible and individually rational for periods t > 1. 

The lemma above guarantees that, under the Virtual Index Mechanism, it is always a best 
response for agents to report their types truthfully regardless of the history, at any time t > 1 
(assuming that other agents will be truthful in the future). 

Therefore, we need only concern ourselves with period deviations from the truthful strategy. 
To obtain incentive compatibility at time 0, we need some notion of monotonicity of the future 
allocations with respect to period O's report. The following lemma provides this monotonicity 
result. 

Before we show the monotonicity lemma, we introduce some notation. Denote the Virtual Index 
(immediate) allocation rule by q'^{6,e,p), where r is the initial report. That is, for each r, we have 
a (subsequent) allocation rule q^{-) which, at any time t > 1 assigns the item based on the reported 
state {9t,et,pt)- 

Lemma 4.3. (Monotonic Allocation) Let Assumptions \4-- f\ \4-^ and \4.3\ hold. Then, for all (joint) 
states {6,e,p) and any two initial reports r and r' , which only differ in the i-th coordinate and 
i"i ^ ) we have for the Virtual Index Mechanism that 

ql{e,e,p)>q:'{e,e,p). 

Note that the lemma above defines an instantaneous notion of monotonicity: it establishes that 
at any time t > 1 and any reported state {9t,et,pt), the allocation of the item to agent i is more 
likely if his first period report Oi^Q is higher. The underlying multi-armed bandit stochastic process 
guarantees that this notion of monotonicity is sufficient for agent i's expected discounted sum of 
all his future values to be monotonic on Oi^Q. We, therefore, obtain in the following lemma that 
the Virtual Index Mechanism is also incentive compatible at period 0. This lemma is the key 
component of our technical argument. 

Lemma 4.4. Under Assumptions \4-i\ \4-^ and \4.3\ the Virtual Index Mechanism is both periodic 
ex-post incentive compatible and individually rational. 

We can now state the proof of our main theorem. 

Proof of Theorem 14. IL From Lemma 14.41 we obtain that the Virtual Index Mechanism is both 
periodic ex-post incentive compatible and individually rational. Hence, From Eq. ()13p . we get that 
the revenue produced by the Virtual Index Mechanism is equal to 

fc oo fc 

i=l t=0 i=l 

where -Pi(-) is as defined in Eq. ()12|) . This value is equal to the bound given in Corollary 13.11 

Therefore, the mechanism is optimal. □ 
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A Appendix 

A.l Analysis of Theorem [37T] 



The proof of this theorem follows the outline of iMyersonI 198ll | for establishing incentive compati- 
bility in static mechanism design. In Lemma lA.H we derive an envelope condition that the utility 
of players participating in incentive compatible mechanism must satisfy. We then use this envelope 
condition to derive the desired result. 

Lemma A.l. Assume complete dynamic monitoring. Assume as well that the partial derivative 
^"'^^'ge'*'^' *"* exists for all 6i, Ci^t o,nd pi^t o.nd there exists some B < oo such that ^2}hS2i^iihEiA}.^ < q 
for all 9i, Ci^t and pi^f Then, any incentive compatible mechanism satisfies for all i, 6i and 6-i, for 
all players other i being truthful, 



w 



Ml 



E 



dvi{z,ei^t,Pi,t) 



dz 



dz. 



(14) 



Proof. In a direct mechanism in this setting, the agents' only report is 9 at the initial period. The 



mecha nism design problem is therefore static and we use the classical results from lMilgrom and Segal 



2003 ] (Theor em 2) to obtain our envelope condition. We now show that these conditions are sat 



isfied here. 

For any mechanism M and initial type profile 9, let the expected utility of player i reporting 
type 9i is be given by 



M, 



3i,9i\9-i)=E 



^ (qi,tVi{9i, ei^t,Pi,t) - Pi,t{0, ei,t, Pi,t, 



t=i 



Consider the term tj-^{9i,-\9-i) applied to two different values 9i and 9[. Taking the difference 
between the two and dividing by 9i — 9[, we obtain 



U^{9i,9i\9.i) - U^{k,9[\9^i) 



E 



%t 



i=l 



Vi{Oi, ei^t,Pi,t) - Vi{9'i, ei^t,Pi,t) 



Since \qi^t\ ^ 1 for all 9, ei^t and pi^f, and the partial derivative exists for all 9, Ci^t and 

Pi^t and is bounded by we use Lebesgue's Dominated Convergence Theorem to obtain that the 
partial derivative — {d%A\9-i) g^jg^g £qj, ^ q. ^^^^ q ^j^j satisfies 



dU^{9i,9,\9. 
d9i 



E 



^rt-i dvi{9i,ei^t,Pi,t) 

2^0 Qi^t- 



i=l 



d9i 
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Furthermore, 



< for all 9i and 9 and, therefore, iJ-^{9i,-\0-i) is absolutely 
continuous for any 9i ari d 9_i. Therefore, the function tj-^{9i,9i\9_i) satisfies the conditions of 
Milgrom and Segall |2002l | Theorem 2, yielding Eq. ([HI). □ 



Proof of Theorem [STU Consider first the utility U^{9) of an agent i under an initial type profile 
0, which is given by 



uf{9)-ur{Q,e.,) 



M, 



E 



qi,t— 



t=l 



dz 



'I — '^t'j—i 



dz 



from Eq. (jl4p . Taking the expectation of this term over all possible type profiles 9, we obtain 



E[C//^(e)-?7^(0,e_,)] = 
Inverting the order of integration, 

nut'{e)-u^{^,9.,)] = 





/ E 


Jo 


Jo 



t-i„ dvi{z,ei,t,pi,t) 

i,t 



t=i 



dz 



dzfi{9i)d9i 



E 



Jz 

e, 

E 



/ Qi,t- 



.t=i 



dz 







.t=i 



i„i dvi{z,ei^t,Pi,t) 



M,t- 



dz 



fi{9i)d9idz 
{1-F,{z))dz. 



By multiplying and dividing the right-hand side of the equation above by the density fi{z) we 
obtain an unconditional expectation. 



E[U,^{9)-Ut'{0,9^i)]=E 



Ml 



E°° 1 - Fi{9,i) dvi{9i,ei^t,Pi,t) 

Qi,f 



t=i 



d9i 



(15) 



Now note that the total revenue of the mechanism is given by the sum of the payments from the 
agents, which themselves are the difference between the value of the allocations to the agents and 
the utility they obtain, i.e.. 



Rev^ = Y.nPt'm = E[V,^i9) - 



i=l 



Combining the definition of the value V/^ [9] with Eq. (fT5 

k 



Rev^ = J^E 



4 = 1 



tVi{9i,ei^t,Pi,t) - 2^0 Qi^t — T7W\ 

Ji\"i) 



i,t) ttM 



89,, 



Ur{0,9..) 



yields the desired result by the plugging in the definition of the virtual value ip. 



□ 
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A. 2 Omitted Proofs 



Proof of Lemma 14. IL The claim for additive separability is trivial. In the multiplicatively sepa- 
rable case, we have: 



and the claim follows. 



Aii9,)B,{e„pi) - Ci{pi) - l^^A',{ei,pi)Biie„p,) 

Ji\Pi) 

Am - ^-j^^up^)^ BmPr)-a{p^) 

ai{6i)Ai{ei)Bi{ei,pi) - Ci{pi) 
ai{Bi)vi{ei,ei,pi) + (aj(6'i) - l)Ci{pi) 
ai{6i)vi{6i,ei,pi) + lii{9i,Pi) 



□ 



Proof of Lemma 14.21 The proof follows the outline of iBergemann and Valimakil 20071 ] , with 
"virtual values" replacing values as the mechanism's objective. In this proof we assume an initial 
type report profile (not necessarily truthful reports) and concern ourselves only with periods 
t > 1. Let W^'{6,e,p) denote the discounted future virtual surplus, at time t > 1 and state 
{0,e,p), when agent i uses reporting strategy TZi and other agents are truthful. 



Wj^'ie,e,p) 



maxE 



.t'=t i 



i,et = e,pt = p 



where Q is the set of all allocation rules and qj'^, denotes the allocation induced by TZi (other agents 
are assumed truthful). 

Similarly define virtual surplus without agent i: 



W-.i^t{9,e, p) = max E 



^Qj,t' y^j{^j,o)vj{9j,ej^t', Pj,t') + P{9jfi, Pj,t') 

t'=t j^i 



,et = e,pt = p 



where Q_j is the set of allocation rules that never assign items to agent i. Note W-i is defined 
without reference to TZi because agent i does not effect the allocation. In addition, let rui^t denote 
the marginal contribution of agent i, at time t, to the virtual surplus, i.e., 

m^,t = W'^'{e,et,pt)-W.i{e,et,pt)-6E[W'^'{9,et+i,pt+i)-W.i{e,et+i,pt+i)]. (16) 

If the mechanism does not allocate the item to agent i at time t, then nii^t = pi^t = 0. But, if 
the mechanism does allocate the item to agent i at time t, then 



W 



Hi 



et,Pt) = {ai{9ifi)vi{9i,ei^t, Pi,t) + li{9ifi, Pi,t)) + 5¥.[W'^'{9,et+i, pt+i)] 



Since an agent's state does not change in the absence of an allocation, if the item is allocated to i 
at time t, we have 



,et,Pt) = W-i{9,et+i,pt+i). 
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Hence, we obtain a marginal contribution (cf. Eq. ()16p ) for agent i at time t of 



ai{Oi,o)vi{Oi, ei^t,Pi,t) + ^{Oifl, Pi,t) ) - (1 - 6)W-i{et, pt) 
aiiOifi) {vi{0i,ei^t,Pi,t) -Pi,t) , 



where the price pi^t is given in Eq. (jlip . Noting that rrii^t can only be different than zero if gj^' = 1, 
we obtain that the expected future utility of agent i at time t given reporting strategy TZi^t is 



U 



t[f^,ei,t,pi,t) 



E 



.t'=t 
1 



<5*' ^{q^t>Vi{6i,ei^t, Pi,t) - Pi,, 

oo 

,et,pt) - W-i{e,et,pt)) 



1 



lt'=t 



Note that W-i is independent of all of agent i's reports. Also, W^^{6t,et, pt) is maximized if 
i reports truthfully since W is define as the maximum virtual surplus obtained by an allocation 
with respect to the true (joint) state. Therefore, we obtain that the mechanism is periodic ex-post 
incentive compatible. Observe as well that W'^ > W-i, where T denotes the truthful strategy; 
yielding that the mechanism is also periodic ex-post individually rational. 

□ 



Proof of Lemma 14.31 The Virtual Index Mechanism allocates the item to the agent with the 
highest "virtual index" . Therefore, it is sufficient to show that the virtual value ■0i is non-decreasing 
on the initial report of the agent. Therefore, it suffices to show that both aj(-) and p) are non- 
decreasing functions of 9i under the assumptions of the lemma. With this it directly follows that 
the Virtual Index is monotonic in the initial reports. Let r]i{9i) denote the hazard rate, i.e.. 



In the additive case. 



where (•)' denotes a partial derivative with respect to 9i. By the assumptions that Ai is concave 
and non-decreasing, and the hazard rate is non-negative and increasing, we have that the above is 
non-negative. 

In the multiplicative case, first note that Oii{9i) = 1 — ;;^-^-j(log Aj(0,j))', so 

which is non-negative by the assumptions. Since /3i = (a^ — l)Cj and since Q is positive, fii is also 
non-decreasing. □ 
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Proof of Lemma 14.41 Let Ui{9) be the expected utility of i in under the Virtual Index Mechanism 
conditioned on the initial types being 6 (and everyone behaving truthfully). By construction, see 
Eq. (fT2]) . we have: 



Ui{e) = Viie)-Piie) 



E 



dvi{z,ei^t,Pi,t) 



t=i 



dz 



dz 



(17) 



Here, we are slightly abusing notation in that we are only specifying the superscript with respect 
to z in ^ (it implicitly also depends on 6^,i). Observer that JJi(fi) — 0. Moreover, because Vi is 
increasing in C/j is non-negative. Hence, the mechanism is ex-post individually rational with 
respect to the (first) report of the initial type. We now prove that the mechanism is ex-post 
incentive compatible with respect to the (first) report of the initial type. 

Let UiiO; 9[) be the utility of agent i conditioned on: the initial types being 9; the initial report 
in the 0-th period of i being O'-; where i behaves optimally thereafter; and where all other agents 
behave truthfully (conditioned on their initial type and report being O^i). To prove (periodic 
ex-post) incentive compatibility, we must show that for all and all 9[, 

Ui{9) > Ui{9-9[). 

Let 9' equal 9 in all coordinates except in coordinate i, where it equals to 9[. Lemma [4.21 shows 
that truthfulness is the optimal continuation strategy from time t>\ onwards. Hence, 

Ui{9') = Ui{9'-9[). 



Thus, it suffices to show: 



Ui{9)-U,{9')>U.i{9-9';)-U,{9 



(18) 

for all 9 and 9^. Now we write this condition more explicitly. 

Now let us consider the mechanism (•) from t > 1, where 9' is fixed. Let (9) be the utility 
of this mechanism from t > 1 onwards (where truthful reporting occurs). By Lemma 14.21 we have 
that q^' is an incentive compatible allocation. Hence, the envelope lemma impliesH 



Uf{9)-Uf{9') 



E 



dvi{z,ei^t,Pi^t) 



t=i 



dz 



dz 



Now note that at i = the charge only depends on the initial report (e.g., 9'). Hence, 



U.{9;9i)-U,{9';9'^ = u!'{9)-U!'{9') 



E 



1 Q, dvi{z,ei^t,Pi,t) 



.t=i 



dz 



9i 



dz (19) 



where now the evolution of the process is under the mechanism q^ . 
Combining Eq (fTTI) and (fT9l) . we have that Eq. (fTH]) is equivalent to: 



/' 



E 



-1 z dvi{z,ei^t,Pi,t) 



di,t' 



dz 



dz > 



E 



.t=i 



t-i e' dvi{z,ei^t,Pi,t) 



dz 



dz 



■^The envelope lemma requires that ^^Jhi^^^iiEA^ < ^ for some B < oo for all 6i, ei and pi. Assumption 14.21 
together with the compact support of 9i, guarantee that this condition holds. 
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We can couple the two expectations by sampling the initial types and an infinite sequence of 
experiences — then each process is evolved on this sample using the appropriate allocations. Thus, 
this condition is equivalent to: 



E 



oo 

t=i 



dvi{z,ei^t,Pi,, 



dz 



¥z 



dz>Q 



where e • ^ and pi^t denote the experiences under the allocation q^' . 

Now let Tfc be the time at which the k-ih. allocation to i occurs under the qf^ (assumed to 

be infinite if the fc-th allocation does not occur), and let be this time under q^^. By our 
coupled process, we have that ^, equals ej^T-^ for all k. This is because after k allocations, both 



private experiences have advanced precisely k — 1 times. Similarly, the same is true for the public 
experiences. Hence, the above condition is equivalent to: 



E 



.fc=o 



dvi{z, e 



dz 



dz 



since the experiences at and are identical. 

Without loss of generality, assume that 6 > 9'. To complete the proof of incentive compatibility, 
we show that > t^, using the monotonicity of the allocation (Lemma 14. 3p . This implies that: 

qf{z,et,pt) < qt{z,et,pt) ■ 

We proceed inductively. For k = 1, note that until the first allocation occurs in either process, 
the state of i is identical at each timestep under both allocations — this is because the Gittins 
allocation is an index mechanism (so the states of the other agents advance identically when i is 
not present). Hence, monotonicity implies that the first allocation under q^ cannot occur before 
that time under q^ . Taking the inductive step, assume that t'^_i > t^-i- If T'f,_^ > r^, then we 
are done. If not, this implies at time t^_^, agent i has been allocated precisely k — 1 times in both 
processes, and so she has an identical state in either process. Hence, an identical argument to that 
of the first period shows that the k-ih. allocation (if it occurs) under q^ cannot occur before that 
under q^ , which completes the proof. □ 
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